专利摘要:
the present invention relates to systems, methods and software for data obfuscation framework for user applications. an exemplary method includes providing user content to a rating service configured to process user content to classify portions of user content as comprising sensitive content and receiving referrals from user content containing sensitive content from the rating service. the method includes presenting graphical indications in a user interface to the user application that annotates the user content as containing the confidential content and presents obfuscation options in the user interface to mask the confidential content within at least a selected portion of the user content. in response to a user selection of at least one of the obfuscation options, the method includes replacing the associated user content with obfuscated content that maintains a data schema of the associated user content.
公开号:BR112019017319A2
申请号:R112019017319-6
申请日:2018-03-16
公开日:2020-03-31
发明作者:David ALLEN Phillip;Cristina Oropeza Hernandez Sara
申请人:Microsoft Technology Licensing, Llc;
IPC主号:
专利说明:

Invention Patent Descriptive Report for BLOODING USER CONTENT IN STRUCTURED USER DATA FILES.
BACKGROUND [001] Various user productivity applications allow data entry and analysis of user content. These applications can provide content creation, editing and analysis using spreadsheets, presentations, text documents, mixed media documents, message formats or other user content formats. Among this user content, various textual, alphanumeric or other character-based information may include confidential data that users or organizations may not want to include in published or distributed work. For example, a spreadsheet can include social security numbers (SSNs), credit card information, health care identifiers, or other information. Although the user who enters this data or the user content may be authorized to view the confidential data, other entities or distribution endpoints may not have this authorization.
[002] The techniques of protection and management of information can be called protection against data loss (DLP) that tries to prevent the misappropriation and misallocation of this confidential data. In certain content formats or content types, such as those included in spreadsheets, slide-based presentations, and graphical diagramming applications, user content can be included in multiple cells, objects, or other structured or semi-structured data entities. In addition, confidential data can be divided between more than one data entity. Difficulties can arise when trying to identify and protect against loss of confidential data when these documents include confidential data.
Petition 870190080864, of 8/20/2019, p. 17/85
2/53 fidential.
GLOBAL VISION [003] Systems, methods and software for data obfuscation frameworks for user applications are provided here. An exemplary method includes providing user content to a rating service configured to process user content to classify portions of user content as understanding sensitive content and receiving referrals from user content containing sensitive content from the rating service. . The method includes displaying graphical indications in a user interface for the user application that annotates user content as containing sensitive content and presents obfuscation options in the user interface to mask sensitive content in at least a selected part of the user content. In response to a user selection of at least one of the obfuscation options, the method includes replacing the associated user content with obfuscated content that maintains a data schema of the associated user content.
[004] This Overview is provided to introduce a selection of concepts in a simplified way, which are described below in the Detailed Description. It can be understood that this Overview is not intended to identify key characteristics or essential characteristics of the claimed matter, nor is it intended to be used to limit the scope of the claimed matter.
BRIEF DESCRIPTION OF THE DRAWINGS [005] Many aspects of the invention can be better understood with reference to the following drawings. Although several implementations are described in connection with these drawings, the invention is not limited to the implementations described herein. On the contrary, the intention is to cover all alternatives, modifications and equivalents.
Petition 870190080864, of 8/20/2019, p. 18/85
3/53 [006] Figure 1 illustrates a data loss protection environment in one example.
[007] Figure 2 illustrates elements of a data loss protection environment in an example.
[008] Figure 3 illustrates elements of a data loss protection environment in one example.
[009] Figure 4 illustrates operations of data loss protection environments in an example.
[0010] Figure 5 illustrates operations of protection environments against data loss in an example.
[0011] Figure 6 illustrates operations of protection environments against data loss in an example.
[0012] Figure 7 illustrates operations of data loss protection environments in an example.
[0013] Figure 8 illustrates data limit operations for data loss protection environments in an example.
[0014] Figure 9 illustrates a computing system suitable for implementing any of the architectures, processes, platforms, services and operational scenarios described here.
DETAILED DESCRIPTION [0015] User productivity applications provide user data and create, edit and analyze content using spreadsheets, slides, vector graphics, documents, emails, message content, databases or other formats and types application data. Among the user content, various textual, alphanumeric or other character-based information can be included. For example, a spreadsheet can include social security numbers (SSNs), credit card information, health care identifiers, passport numbers, or other information. Although the user who enters this data or user content may have
Petition 870190080864, of 8/20/2019, p. 19/85
4/53 authorization to view confidential data, other entities or distribution endpoints may not have this authorization. Various privacy policies or data privacy rules can be established to indicate which types of data or user content are confidential in nature. The advanced data loss protection (DLP) measures discussed here can be incorporated to try to prevent misappropriation and misallocation of this confidential data.
[0016] In certain content formats or content types, such as those included in spreadsheets, slide-based presentations and graphical diagramming applications, user content can be included in multiple cells, objects or other structured or semi-structured data entities . In addition, confidential data can be divided between more than one element or data entry. The examples presented here allow for improved identification of sensitive data in user data files that include structured data elements. In addition, the examples presented here provide enhanced user interfaces to alert users to confidential data. These user interface elements can include individual tagging of data elements containing sensitive data, as well as limits for alerts when editing content.
[0017] In a sample application that uses structured data elements, such as a spreadsheet application, data can be inserted into cells that are arranged in columns and rows. Each cell can contain user data or user content and can also include one or more expressions that are used to perform calculations, which can refer to data entered by the user in one or more other cells. Other user applications, such as slide show applications, may include content from
Petition 870190080864, of 8/20/2019, p. 20/85
5/53 user on more than one slide, as well as on objects included in those slides.
[0018] Advantageously, the examples and implementations provided here provide improved operations and structures for data loss protection services. These enhanced operations and structures have technical effects for faster identification of confidential content in documents and especially for structured documents, such as spreadsheets, presentations, graphic designs and the like. In addition, multiple applications can share a single rating service that provides detection and identification of sensitive content in user data files across multiple end-user applications and platforms. Annotation and obfuscation processes at the end user level also offer significant advantages and technical effects on user interfaces for applications. For example, users can be presented with graphical annotations of confidential content and pop-up dialog boxes that present various options for obfuscation or masking. Various improved annotation limits can also be established to dynamically indicate sensitive content to users to make editing user content and obfuscating sensitive data more efficient and compatible with various data loss protection policies and rules.
[0019] As a first example of a data loss protection environment for a user application, figure 1 is provided. Figure 1 illustrates the data loss protection environment 100 in one example. Environment 100 includes user platform 110 and data loss protection platform 120. The elements in figure 1 can communicate through one or more physical or logical communication links. In figure 1, links 160-161 are shown. However, it should be understood that these links are only exemplary and
Petition 870190080864, of 8/20/2019, p. 21/85
Q / 53 that one or more additional links may be included, which may include wireless, wired, optical or logical portions.
[0020] A data loss protection framework can include a local portion for a specific user application and a shared portion used in many applications. User platform 110 provides an application environment for a user to interact with elements of user application 111 through user interface 112. During user interaction with application 111, content entry and content manipulation can be performed . The application data loss protection (DLP) module 113 can provide portions of the functionality for annotating and replacing confidential data in application 111.0 Application module DLP 113 is local to user platform 110 in this example, but can be separated or integrated with the 1110 application DLP 113 application module can provide annotation and replacement of confidential data for users and applications 111. The data loss protection platform 120 provides a shared portion of a data loss protection framework and provides a shared service DSP 121 for many applications to share, such as 190 applications with DLP portions of associated location 193.
[0021] In operation, application 111 provides user interface 112 through which users can interact with application 111, how to insert, edit and manipulate user content that can be loaded through one or more data files or inserted via user interface 112. In figure 1, a spreadsheet workbook is shown with cells arranged in rows and columns. As part of application 111, a data loss protection service is provided that identifies sensitive user content and allows users to replace user's confidential content with secure text or data. Confidential content includes content that may have
Petition 870190080864, of 8/20/2019, p. 22/85
7/53 privacy concerns, privacy policies / rules or other properties for which dissemination would be unwanted or unwanted. Data loss in this context refers to the dissemination of private or confidential data to unauthorized users or endpoints.
[0022] To identify confidential content, application 111 provides user content breakdowns in pieces or pieces of user content for a data loss protection service. In Figure 1, portions of content 140 are shown with individual portions of content 141 to 145 being provided over time for the DLP 121 service. Typically, application 111 can process user content to share user content into portions during inactive periods, such as when or more processing segments related to application 111 are inactive or below activity limits. As will be discussed here, structured user content is transformed into a 'flat' or unstructured arrangement during the sharing process. This unstructured arrangement has several advantages for processing by the DLP 121 service.
[0023] The DLP 121 service then processes each portion or 'piece' of user content individually to determine whether the portions contain confidential content. Various classification rules 125, such as data schemas, data standards or privacy policies / rules can be introduced in the DLP 121 service to identify confidential data. After the DLP 121 service analyzes each individual piece of user content, the location offsets of sensitive data in the user data file are determined and reported to the application's DLP service 113. A mapper role in the DLP application service 113 determines a structural relationship between displacement of pieces and the structure
Petition 870190080864, of 8/20/2019, p. 23/85
8/53 of the document. Indications of location offsets, lengths of confidential data and types of confidential data can be provided for application 111, as seen, for example, indications of confidential data 150. The location offsets indicated by the DLP 121 service may not produce an exact location or specific among the structural elements of the user data file for confidential content. In these cases, a mapping process can be employed by the DLP application service 113 of application 111 to determine specific structural elements that contain confidential data.
[0024] Once specific locations are determined, application 111 can annotate sensitive data in the user interface
112. This entry may include global or individual flagging or marking of confidential data. Annotations can comprise 'policy tips' presented in a user interface. Users can then be given one or more options to obfuscate or make user content non-identifiable as the original confidential content. Various limits on the notification of confidential content can be set to trigger counts or quantities of the confidential data present in the user data file.
[0025] In one example, the user data file 114 includes the contents 115, 116 and 117 in particular cells of the user data file 114, which can be associated with a specific work tab or page in the workbook. spreadsheet. Various contents can be included in the associated cells and that content can include potentially confidential data, such as the examples seen in figure 1 for SSNs, telephone numbers and addresses. Some of this content may cross structural boundaries in the user data file, such as spanning multiple cells or spanning multiple graphic objects. If the ‘chunk’ breaks the data up into rows
Petition 870190080864, of 8/20/2019, p. 24/85
9/53 or groupings of lines, flat representations (that is, excluded from any structural content) may still identify confidential data in one or more cells.
[0026] The elements of each user platform 110 and DLP 120 platform can include communication interfaces, network interfaces, processing systems, computer systems, microprocessors, storage systems, storage media or some other processing devices or systems software, and can be distributed across multiple devices or across geographic locations. Examples of elements for each user platform 110 and DLP 120 platform may include software such as an operating system, applications, logs, interfaces, databases, utilities, drivers, network software and other software stored on a computer-readable medium. The elements of each user platform 110 and DLP platform 120 may comprise one or more platforms that are hosted by a distributed computing system or cloud computing service. Elements of each user platform 110 and DLP platform 120 may comprise elements of logical interface, such as software-defined interfaces and application programming interfaces (APIs).
[0027] Elements of user platform 110 include application 111, user interface 112 and the DLP application module
113. In this example, application 111 comprises a spreadsheet application. It should be understood that user application 111 can comprise any user application, such as productivity applications, communication applications, social media applications, game applications, mobile applications or other applications. User interface 112 comprises user interface graphics that can produce output for display to a user and receive input from a user. User interface 112 can comprise
Petition 870190080864, of 8/20/2019, p. 25/85
10/53 elements discussed below in figure 9 for the 908 user interface system. The DLP 113 application module comprises one or more software elements configured to deliver content for delivery to a rating service, note data that is classified as confidential and obfuscate confidential data, other operations.
[0028] The elements of the DLP 120 platform include the DLP 121 service. The DLP 121 service includes an external interface in the form of an application programming interface (API) 122, although other interfaces may be employed. The DLP 121 service also includes tracker 123 and classification service 124, which will be discussed in more detail below. API 122 can include one or more user interfaces, such as web interfaces, APIs, terminal interfaces, console interfaces, shell command line interfaces, extensible markup language (XML) interfaces, among others. Crawler 123 maintains counts or amounts of sensitive data found for a specific document in flat portions of structured user content and also keeps track of location offsets within flat portions of structured user content that correspond to sensitive data locations in content structured user. Tracker 123 can also perform limit analysis to determine when limit quantities of sensitive data are found and should be noted by the DLP 113 application module. However, in other examples, the limit / count portions of the DLP 121 service can be included in the DLP module 113. The rating service 124 analyzes flat user content to determine the presence of sensitive data and can employ various entries that define rules and policies to identify sensitive data. Elements of the DLP 113 application module and the DLP 121 shared service can be configured in different arrangements or distributions shown in the figure
Petition 870190080864, of 8/20/2019, p. 26/85
11/53
1, such as when portions of the shared service DSP 121 are included in the application module DLP 113 or application 111, among other configurations. In one example, portions of the shared service DSP 121 comprise a dynamic link library (DLL) included on user platform 110 for use by application 111 and DLP application module 113.
[0029] Links 160-161, together with other links not shown between the elements of figure 1 for clarity, may each comprise one or more communication links, such as one or more network links comprising network links with or wireless. Links can include multiple logical, physical, or application programming interfaces. Examples of communication links can use metal, glass, optics, air, space or some other material as a means of transportation. Links can use various communication protocols, such as Internet Protocol (IP), Ethernet, coaxial optical fiber (HFC), synchronous optical network (SONET), asynchronous transfer mode (ATM), time division multiplexing (TDM) , circuit switching, communication signaling, wireless communications or some other communication format, including combinations, improvements or variations thereof. The links can be direct links or can include networks, systems or intermediary devices and can include a logical network link carried through multiple physical links.
[0030] For an additional discussion on the elements and operation of environment 100, figure 2 is presented. Figure 2 is a block diagram illustrating the example configuration 200 of the DLP 113 application module, which highlights example operations of the DLP 113 application module, among other elements. In Figure 2, the DLP 113 application module includes content splitter 211, annotator 212, mapper213 and obfuscator 214. Each of elements 211 through 214 can comprise software modules used by the module
Petition 870190080864, of 8/20/2019, p. 27/85
12/53 DLP 113 application to work as discussed below.
[0031] In operation, user content is provided for the DLP 113 application module, such as a spreadsheet file or workbook as seen in figure 1 for the user data file
114. This user data file can be arranged in a structured or semi-structured format, such as cells arranged by rows and columns for an example spreadsheet. Other data formats can be employed, such as slide shows with pages / slides and many individual graphic objects, vector drawing programs with several objects on several pages, word processing documents with several objects (tables, text boxes, images ), databases, web page content, or other formats, including combinations thereof. User data files can contain confidential content or confidential data. This sensitive data can include any user content that fits one or more data standards or schemes. Examples of sensitive data types include social security numbers, credit card numbers, passport numbers, addresses, phone numbers or other information.
[0032] Parallel to the editing or viewing of the user data file, the content splitter 211 subdivides the user content into one or more portions or 'pieces' that are in a flat form of the original / native hierarchical or structured form. The content splitter 211 can then provide these pieces of content for the DSP 121 shared service, along with pieces metadata for each piece. Chunk metadata can indicate various chunk properties, such as a chunk location offset in the total content and a chunk length. The location offset corresponds to a location of the chunk in relation to the user's general document / file, and the length
Petition 870190080864, of 8/20/2019, p. 28/85
13/53 piece size corresponds to a piece size.
[0033] The shared service DSP 121 individually analyzes the pieces of content to identify sensitive data among the flat user content of the pieces and provides indications of the confidential data back to the DLP 113 application module. In some examples discussed below, several limits are applied to counts or amounts of confidential data before the indications are provided to the DLP 113 application module. The indications comprise offsets for each of the pieces containing confidential data, lengths of the pieces and, optionally, data type indicators or schematics of data associated with confidential data. Confidential data statements can be used to determine current or specific locations of confidential content among the structured data of the user data file. The data type indicators can be symbolically or numerically encoded indicators, such as integer values, which are referenced to a list of indicators that the mapping device 213 can use to identify data types for annotation.
[0034] Mapper 213 can be used to convert offsets and lengths at specific locations within a user document or file. Offsets and lengths correspond to specific chunk identities that are maintained by mapper 213 and stored in association with a session identifier. The session identifier can be a unique identifier that persists at least as long as the session during which the user has the document open or viewed. Mapper 213 can be provided with chunk metadata from content splitter 211 to form mapped relationships between chunk offs, lengths and session identifiers. In response to
Petition 870190080864, of 8/20/2019, p. 29/85
14/53 indications received from the confidential data, the mapper 213 can use the mapped relations to identify coarse locations indicated for the confidential data within a document that corresponds to the displacement and lengths of the piece. As the pieces can span more than one structural or hierarchical element of the user data file, the mapper 213 can perform other localization processes to find specific locations in the user data file for sensitive data. [0035] For example, offsets can indicate rough locations, such as a specific row or specific column in a spreadsheet. To determine a specific location, such as a cell within the indicated row or column, the mapper 213 can use the offsets / lengths together with the local knowledge of the structured data and the user data file itself to locate the confidential content among the data structured. Mapper 213 determines where in the user data file the pieces are provided, such as rows, columns, working tabs for example spreadsheets, and associated slides / pages and objects for example slide shows. Other examples, such as word processing examples, may not have much structure, and the content is more easily flattened and offsets may be based on document word counts or similar placement.
[0036] In some examples, specific locations are determined by looking for confidential content in a particular coarse location. When multiple structural elements or hierarchical elements are implied by a particular offset, the mapper 213 can search or iteratively browse each of the elements to locate the confidential data. For example, if there are 'n' levels of structure / hierarchy in a document, mapper 213 can ship
Petition 870190080864, of 8/20/2019, p. 30/85
15/53 first by the upper hierarchies and then the lower hierarchies. In spreadsheet examples, the hierarchy / structure can comprise work tabs with associated rows and columns. In the presentation document examples, the hierarchy / structure can include slides / pages with associated formats / objects. Each work tab and slide indicated by the offset can be scrolled to find the exact cells or objects that contain the confidential content. In other examples, the location of the confidential data can be done by recreating one or more pieces associated with the coarse location and finding the confidential data within those recreated pieces to find the specific location of the confidential data.
[0037] Once the specific locations of the confidential data have been determined, then annotator 212 can be employed to mark or otherwise flag the confidential data for a user. This annotation can take the form of a global flag or banner that indicates to the user that confidential content is present in the user data file. This annotation can take the form of individual flags that indicate marks next to confidential data. In an example, Figure 2 shows configuration 201 with a view of a spreadsheet user interface that has a workbook currently open for viewing or editing. A banner annotation 220 is shown, as well as individual cell annotations 221. Individual cell annotations 221 comprise graphical indications that annotate one or more portions of user content and comprise indicators positioned next to one or more portions that are selectable in the interface 112 to present obfuscation options.
[0038] A user can be presented with one or more options when a specific note is selected. It can be apprehended
Petition 870190080864, of 8/20/2019, p. 31/85
16/53 sitting the pop-up menu 202 that includes several viewing / editing options, such as cut, copy, paste, among others. The 202 pop-up menu can also include glare options. Selecting one of the obfuscation options can produce obfuscated content that maintains a data schema of the associated user content and comprises selected symbols to prevent identification of the associated user content while maintaining the data schema of the associated user content. In some examples, symbols are selected based in part on the data schema of the associated user content, among other considerations. For example, if the data scheme includes a numerical data scheme, the letters can be used as symbols of obfuscation. Likewise, if the data scheme includes an alphabetical data scheme, the numbers can be used as symbols of obfuscation. Combinations of letters and numbers, or other symbols, can be selected as obfuscation symbols in examples of alphanumeric content.
[0039] In figure 2, a first obfuscation option includes replacing confidential content with masked or otherwise obfuscated text, while a second obfuscation option includes replacing all content with a data pattern or scheme similar to the content currently selected annotation. For example, if an SSN is included in a cell, a user may be presented with options to replace the digits in the SSN with 'X' characters, leaving an SSN data schema intact, that is, leaving in the familiar character arrangement 3 -2-4 separated by dash characters. In addition, another obfuscation option may include an option to replace all SSNs that conform to the selected SSN standard with 'X' characters. It should be understood that different example obfuscation options can be presented
Petition 870190080864, of 8/20/2019, p. 32/85
17/53, and different characters can be used in the replacement process. However, regardless of the obfuscation characters used, confidential data is made anonymous, sanitized, 'clean' or not identifiable as the original content.
[0040] Now returning to figure 3, example configuration 300 is shown to focus on aspects of the DLP 121 service. In figure 3, the DLP 121 service receives portions of the flat user content, provided in one or more pieces of content by the splitter. of content 211, along with chunk metadata that includes at least offsets in the total content and chunk lengths. Two types of example structured user content are shown in figure 3, namely the contents of spreadsheet 301 and the contents of the presentation / slide show 302. The contents of spreadsheet 301 have reflective lines of structure 321 and columns 322 that define individual cells. In addition, spreadsheet content 301 can have more than one work tab 320, which is delimited by tabs below the work tab, and each work tab can have a separate set of rows / columns. Each cell can contain user content, such as characters, alphanumeric content, text content, numeric content or other content. The content of slide show 302 can have one or more slides or pages 323 that include a plurality of objects 324. Each object can have user content, such as characters, alphanumeric content, text content, numeric content or other content.
[0041] Content splitter 211 subdivides user content into parts and removes any associated structure, for example, extracting any user content, such as text or alphanumeric content, from cells or objects and then arranging the content extracted in flat or linear pieces for delivery to the DLP 121 service. These pieces and pieces metadata are provided
Petition 870190080864, of 8/20/2019, p. 33/85
18/53 to the DLP 121 service for the discovery of potential confidential data.
[0042] Once the individual pieces of user content are received by the DLP service 121, various processing is performed on the pieces by the classification service 124. In addition, tracker 123 maintains data records 332 comprising one or more data structures that list offsets / lengths and session identifier for counts of confidential data found. 332 data records are stored so that the DLP 121 service provides offsets / lengths for pieces containing sensitive data back to a requesting application to further locate and annotate any sensitive content found there.
[0043] The classification service 124 analyzes each piece against various classification rules 331 to identify confidential data or confidential content. The classification rules 331 may establish one or more predetermined data schemes defined by one or more expressions used to analyze representations of flattened pieces / data to identify portions of the pieces as being indicative of one or more predetermined content patterns or one or more predetermined content types.
[0044] Confidential content is typically identified based on a structural data standard or data scheme that is associated with confidential content. These patterns or schemes can identify when the exact content of the pieces may be different, but the data can fit into a pattern or arrangement that reflects sensitive data types. For example, an SSN can have a certain array of data with a predetermined number of digits mixed and separated by a predetermined number of tra
Petition 870190080864, of 8/20/2019, p. 34/85
19/53 ços. The 331 classification rules may include several definitions and policies used to identify sensitive data. These classification rules may include privacy policies, data standards, data schemes and limit policies. Privacy policies may indicate that certain potentially sensitive data may not be designated as confidential to an application due to company, organization or user policies, among other considerations. Threshold policies can establish minimum limits for finding sensitive data in the various pieces before the presence of sensitive data is reported to the application. The 331 classification rules can be established by users or policy makers, such as administrators.
[0045] In addition, the classification service 124 can process the data content using one or more regular expressions handled by the regular expression service (regex) 333. The regex 333 service can include regular expression processing and matching services, along with various regular expressions that a user or policy maker can implement to identify sensitive data. Additional examples of regex 333 service are discussed below in figure 7.
[0046] As a specific example, the 341 classification process illustrates several pieces of Ci-Cs content that are linearized versions of content originally in a structural or hierarchical arrangement in a document or user data file. The classification service 124 processes these pieces to identify pieces that contain confidential data. If any sensitive data is found, indications can be provided to the application. The indications may comprise offsets and lengths for the confidential data, and are provided by the mapper 213 to locate the confidential data within the structure of the user data file. The
Petition 870190080864, of 8/20/2019, p. 35/85
20/53 pieces themselves can be discarded by the classification service 124 after each piece is processed to identify confidential data. Since offsets and lengths allow you to find confidential data within the original data file, and the original content remains in the data file (unless intervening edits have occurred), the actual pieces do not need to be saved after being processed.
[0047] To form the pieces, the content splitter 211 groups alphanumeric content, such as text, into one or more linear data structures, such as strings or BSTRs (basic strings or binary strings). The classification service 124 processes the linear data structures and determines a list of results. The pieces are checked for confidential data and portions of the linear data structures can be determined to have confidential content. Classification service 124 in conjunction with tracker 123 determines offsets / lengths corresponding to pieces containing confidential data between linear data structures. These offsets may indicate coarse locations that can be converted back to specific locations in the original document (for example, a user data file) containing the user content. When pieces are received, tracker 123 can correlate each piece with the offset / length information indicated in the piece's metadata. This offset / length information can be used to reverse map to the structure or hierarchy of the original document by mapper 213.
[0048] However, the DLP 121 service typically has only a partial context back to the original document or user data file, as indicated by the offsets in the originally generated linear data structures. In addition,
Petition 870190080864, of 8/20/2019, p. 36/85
21/53 linear data and user content may have been released / deleted by the classification service 124 at the end of a classification process. This may mean that the classification service 124 may not be able to search directly for confidential content to specifically locate confidential content within the original document, and even if the classification service 124 could search for the precise confidential content classification service 124 may not being able to find the confidential content because the 'chunk' algorithm can cross boundaries of constructions or hierarchical structures in the original document or data file. As a specific example, work tab 320 in a spreadsheet document can have the text SSN 123 45 6789 spanning four adjacent cells. Advantageously, the rating service 124 may find this text to comprise confidential content. However, due to the cross-border analysis by the classification service 124, at the end of the policy rule evaluation, the classification service 124 normally does not have enough data to find the confidential content in the original document for presentation to a user. A user may be left with an incorrect impression that no confidential content was present.
[0049] In order to efficiently digitize user content for confidential content, the rating service 124 reads one piece of user content at a time during the inactivity of the application, performs a partial analysis and continues the process. When the rating service 124 is performed by reading all of the content, the rating service 124 has only rough positions for confidential content in the original content, such as just a start / offset and a length. In order to efficiently map back to a structured or semi-structured document, a combination of techniques can be employed by the 213 mapper.
Petition 870190080864, of 8/20/2019, p. 37/85
22/53 it should be noted that these techniques differ from how a spell or grammar check can work, in part because the full content may be required, rather than just a word / phrase / paragraph, to understand whether the content has exceeded a limit .
[0050] For each level of physical hierarchy or structure present in the original document (ie workbooks in a workbook or slides in a presentation), mapper 213 uses an identifier to indicate the existence in a data structure of mapping and subdivides by a reasonable number of hierarchy levels (ie lines on a work tab, formats on a slide) the content so that, as each one is processed, mapper 213 keeps track of the length of the original content and, based on the order of insertion in the map, the implicit start of that element. The identifier can be a durable, process identifier that persists between open instances of a specific document, or it can be different in each instance of the specific document. In some instances, calculations for amalgamating the presence / absence of sensitive content are retained until there is no remaining unprocessed content or any pending editing that further changes the content.
[0051] Assuming that there is confidential content, the mapper 213 receives from the DLP 121 service a start and length of each piece of confidential content and the mapper 213 performs a search in the mapping data structure of the identifiers and inserts of the confidential content within the most accurate mapped region to find the exact location. For performance reasons, only a certain number of hierarchy levels can be tracked, so that a table within a format within a slide or a cell within a row within a work tab is not tracked individually. Therefore, a repetition
Petition 870190080864, of 8/20/2019, p. 38/85
Partial 23/53 can be performed after doing a reverse mapping to find the precise location.
[0052] In a specific example, a workbook can have 20 workbooks, but millions of rows and each of the millions of rows can have 50 columns of user data. For a relatively small number of pieces of confidential data in this (that is, a sheet has only one column with confidential data), the sorting process can become extremely memory intensive to have 20 * 1 million * 50 pieces of 'length + displacement 'remembered. Removing the last dimension is a 50x savings in memory, for a small computing cost when sensitive data is actually being identified in the original document. Advantageously, a small memory footprint can be maintained to invert the start / length map back to the original content.
[0053] To further illustrate the operation of the elements of figures 1 to 3, a flow chart is shown in figure 4. Two main flows are shown in figure 4, that is, a first flow 400 for identification of confidential data, and a second flow 401 for annotation and obfuscation of confidential data. The first stream 400 can feed the second stream 401, although other configurations are possible.
[0054] In figure 4, the DLP service 121 receives (410) subsets of structured user content consolidated into associated flat representations, each of the associated flattened representations having a mapping to a corresponding subset of the structured user content. As mentioned above, structured content can include spreadsheet content arranged in sheets / rows / columns or it can include other structures, such as slide show content arranged on slides / objects, drawing of
Petition 870190080864, of 8/20/2019, p. 39/85
24/53 program content arranged on pages / objects, or text content arranged on pages, among other structures. These subsets of structured user content can include 'chunks' 141 to 146 shown in figure 1 or Ci-Cs chunks in figure 3, among others. The structure of the underlying user content is flat or removed in these subsets to form the pieces, and each of the subsets can map back to the original structure by referencing structural identifiers or locators, such as sheets / lines / columns or slides / objects, for example .
[0055] The DLP 121 service receives these pieces and pieces metadata, such as over link 160 or API 122 in figure 1, and individually analyzes (411) the flat representations to classify portions as comprising confidential content that corresponds to one or more more predetermined data schemes. Classification rules 125 may establish one or more predetermined data schemes defined by one or more expressions used to analyze representations of data / flat pieces to identify portions of the pieces as being indicative of one or more predetermined content patterns or one or more predetermined content types.
[0056] If confidential data is found (412), then for each portion, the DLP 121 service determines (413) an associated offset / length relative to the structured user content indicated as maintained in crawler 123 in 332 data records. The DLP 121 service then indicates (414) at least the offset / length associated for portions of user application 111 to mark sensitive content in user interface 112 for user application 111. If no sensitive data is found, or if any associated limits are not reached, then further portion processing can continue or monitor
Petition 870190080864, of 8/20/2019, p. 40/85
25/53 to make additional pieces even more, as provided by user application 111. In addition, editing or changing user content may require additional or repeated classification processes for any altered or edited user content.
[0057] The DLP 113 application module receives (415) from the DLP service classification service 121 indications of one or more portions of the user content that contain the confidential content, where the indications comprise offsets / lengths associated with the confidential content. The DLP application module 113 presents (416) graphical indications on user interface 112 for user application 111 which annotates one or more portions of the user content as containing the confidential content. The DLP application module 113 can then present (417) obfuscation options on user interface 112 to mask sensitive content within at least a selected portion of one or more portions of user content. In response to a user selection of at least one of the obfuscation options, the DLP 113 application module replaces (418) the associated user content with obfuscated content that maintains a data schema of the associated user content.
[0058] Figure 5 illustrates the sequence diagram 500 to further illustrate the operation of the elements of figures 1 to 3. In addition, figure 5 includes a detailed example structure 510 for some of the process steps in figure 5. In figure 5, application 111 can open a document for viewing or editing by a user. This document can be detected by the DLP 113 application module. Any associated policies or classification rules can be sent to the DLP 121 service to define any classification policies. The DLP 121 service can then maintain an open document processing instance in the registry
Petition 870190080864, of 8/20/2019, p. 41/85
26/53
332, which can include a listing of several open documents. When application idle processing time periods 111 are detected by the DLP 113 module, an inactive indicator may be presented to the DLP 121 service, which responsively requests pieces of user content for classification. Alternatively, the DLP 113 module can push pieces of user content to the DLP 121 service during inactive periods of the application
111.0 DLP module 113 splits user content into pieces, and these pieces can be determined based on text or other content included in hierarchical structures or objects in the document. Once the pieces have been determined, the DLP 113 module transfers pieces to the DLP 121 service for classification. The DLP 121 service classifies each piece individually and applies classification rules to the pieces to potentially identify sensitive user content between pieces. This sorting process can be an iterative process to ensure that all pieces transferred by the DLP 113 module have been processed. If confidential data or content is found between the pieces, then the DLP 121 service indicates the presence of confidential data in the DLP 113 module for further manipulation. As mentioned here, confidential data can be indicated by offsets, rough locations or other location information, as well as length information. The DLP 113 module can then perform one or more annotation and obfuscation processes on sensitive data in the document.
[0059] Classification rules can be established before the classification process, such as by users, administrators, policy personnel or other entities. As seen in structure 510, several rules 511 and 512 can be based on one or more predicates. The predicates are shown in two categories in figure 5,
Petition 870190080864, of 8/20/2019, p. 42/85
27/53 predicates related to content 511 and predicates related to access 512. Predicates related to content 511 may comprise data schemas that indicate confidential data, such as data patterns, structural data information or regular expressions that define the schemas of data. Dice. Predicates related to 512 access comprise user level, organization level or other access rules, such as content sharing rules that define when sensitive data is not wanted for dissemination or release by users, organizations or other specific factors .
[0060] Policy rules 513 can be established that combine one or more predicates related to content and predicates related to access in policies 551 to 554. Each policy rule also has a priority and an associated action. In general, the priority corresponds to the severity of the action. For example, a policy rule can define which application's 'save' features should be blocked. In another example policy rule, user content may contain SSNs that are defined according to a content-related predicate, but according to an access-related predicate, these SSNs may be acceptable for dissemination. Most policy rules contain at least one ranking predicate between predicates 511 through 512. These policies can affect one or more actions 514. The actions can include various annotation operations that an application can take in response to sensitive identification or content , such as notification of a user, notification, but allowing the replacement of a user, blocking of resources / functions (ie, resources 'save' or 'copy') and justified substitutions, among others.
[0061] Figure 6 illustrates flow diagram 600 to further illustrate the operation of the elements in figures 1 to 3. Figure 6 focuses
Petition 870190080864, of 8/20/2019, p. 43/85
28/53 in an example of an entire process of confidential data identification, annotation and obfuscation processes. Subprocess 601 comprises the establishment, storage and retrieval of policies and rules. These policies and rules may include annotation rules, classification rules, regular expressions, organization / user policies, among other information discussed here. In operation 611 of figure 6, several detection rules 630 and substitution rules 631 can be introduced through a user interface or API to configure detection policies. Detection rules 630 and substitution rules 631 can comprise various predicates and rules, as shown in figure 5, among others. Users, administrators, policy personnel or other entities can introduce detection rules 630 and substitution rules 631, such as establishing policies for users, organizations or application usage, among other entities and activities. Detection rules 630 and substitution rules 631 can be stored on one or more storage systems in operation 612 for later use. When one or more customers wish to use the policies established by detection rules 630 and replacement 631, these policies can be downloaded or retrieved in operation 613. For example, annotation rules can be downloaded by an application for use in content annotation confidential in a user interface, while classification rules can be downloaded by a shared DSP service to classify user content as confidential content.
[0062] Subprocess 602 comprises client-side application activities, such as loading documents for editing or viewing in a user interface and providing pieces of those documents for classification. In operation 614, a client application can provide one or more end user experiences
Petition 870190080864, of 8/20/2019, p. 44/85
29/53 to process user content, edit user content or display user content, among other operations. Operation 614 can also provide annotation and obfuscation processes that will be discussed later. Operation 615 provides portions of this user content for a shared DSP service for classifying user content. In some instances, the portions comprise flat pieces of user content that are stripped of the associated structure or hierarchy of the original document.
[0063] Subprocess 603 comprises the classification of user content to detect confidential data among user content, as well as the annotation of this confidential data for a user. In operation 616, several detection rules are applied, such as the regular expressions discussed below in figure 7, among other detection rules and processes. If confidential data is found, operation 617 determines whether a user should be notified. Notification may not occur if the amount of sensitive data falls below an alert threshold. However, if the user needs to be alerted, operation 619 can calculate the locations of the confidential data in the detected regions of the structured data. As discussed here, a mapping process can be employed to determine specific locations of sensitive data within structured elements or hierarchical elements from flat data displacements and lengths of data strings or portions. Once these specific locations are determined, operation 618 can display the locations to the user. Annotations or other highlighted elements of the user interface are used to signal the user that confidential data is present in the user content.
[0064] Subprocess 604 comprises obfuscation of confidential data within the user content comprising the elements
Petition 870190080864, of 8/20/2019, p. 45/85
30/53 structured or hierarchical. In operation 621, user input can be received to replace at least one instance of sensitive data with saved or obfuscated data / text. When a user displays a highlighted region that demonstrates a piece of sensitive data that caused a note or policy tip, the user may be given an option to replace sensitive data with secure text that obfuscates sensitive data. Depending on the choices made by the entities that initially define the policies in operation 611, operations 622 and 624 determine and generate one or more substitution or obfuscation rules. Obfuscation rules can be used to replace an internal code name with a name approved by marketing, used to obfuscate personally identifiable information (Pll) with standardized names, can be used to replace sensitive numeric data with a set of characters that indicate to future readers of the document regarding the type of confidential data (ie credit card numbers, social security numbers, vehicle identification numbers, among others) without revealing the actual confidential data. Operation 623 replaces confidential data with obfuscated data. Obfuscated data can be used to replace sensitive numeric data with a character set that could be used to confirm a data scheme or content type, but remains insufficient to derive the original data even by a particular individual (ie, determine that the content is an SSN, but does not reveal the actual SSN). Users can perform individual or single instance replacement of sensitive content with obfuscated text or bulk replacement of a user interface that shows multiple instances of confidential content.
[0065] The substitution of confidential content, such as text or alphanumeric content, can be done with regular expressions, or
Petition 870190080864, of 8/20/2019, p. 46/85
31/53 alternatively through non-deterministic finite automata (NFA), deterministic finite automata (DFA), stack automata (PDA), Turing machines, arbitrary functional code or other processes. Substituting sensitive content usually includes pattern matching between text or content. Such pattern matching can leave characters or content unmasked considering whether the target pattern has the ability to have multiple characters in a specified location in a string and those characters do not need to be masked, as for delimiting characters. For example, a 123-12-1234 chain can become xxx-xx-xxxx and a 123 12 1234 chain can become xxx xx xxxx after a masking process. Such pattern matching can also maintain certain discernible portions for the purpose of exclusivity, such as with the last predetermined number of digits in a credit card or SSN number. For example, 1234-1234-1234-1234 can become xxxx-xxxx-xxxx-1234 after a masking process. For code name masking / substitution, not all aspects are standard and may, in fact, be internal code names or other keywords. For example, a Whistler code name can become Windows XP after a masking process. In addition, patterns that replace a varied number of characters with secure text can maintain a consistent length or define the length as a known constant. For example, the same rule can transform 1234-1234-1234-1234 into xxxx-xxxx-xxxx-1234 and xxxxxxxxxx-x1234 after a masking process. This may require a pattern that contains enough data to handle any of these cases. Regular expressions can handle these scenarios by increasing the regular expression by surrounding each atom-matched expression with parentheses and keeping track of
Petition 870190080864, of 8/20/2019, p. 47/85
32/53 which augmented 'match' instructions are paired with which 'replace' instructions. Other examples of regular expression matching are seen in figure 7 below.
[0066] To maintain the integrity of the annotation and classification processes between more than one document / file, several processes can be established. Detection / classification, annotation and obfuscation rules and policies are not normally included in document files. This allows for policy changes and prevents reverse engineering of obfuscation techniques. For example, if a user saves a document, closes and uploads the same document, the rules for which parts of the document contain the confidential data necessary to consider the presence of confidential data that a policy issue may have changed. In addition, annotation flags should not be included in clipboard operations, such as cut, copy or paste. If a user copies the contents of one document and pastes it into another, that second document may have different detection / classification, annotation and obfuscation rules applied. If a user has to content the text of a first document and paste it into a second document, the first annotations in the document should be considered irrelevant until they are reclassified. Even if a user copies the content of a document to the same document, any count of confidential content can change and what needs to be highlighted throughout the document can change.
[0067] Figure 7 illustrates flow diagram 700 to further illustrate the operation of the elements in figures 1 to 3. Figure 7 focuses on regular expression operations during the process of obfuscating confidential data. In figure 7, given a regular expression (regex), like the regular expression 730 of the fictitious driver's license, and a corresponding string, a correspondence
Petition 870190080864, of 8/20/2019, p. 48/85
The complete 33/53 can be generated by at least increasing the regular expression by surrounding each separable character matching expression with parentheses (for example, each atom), as indicated in operation 711. The increased regular expression can then be reapplied or performed in operation 712 to perform an obfuscation or masking process. For each match, operations 713 to 714 determine the widest and narrowest sets of characters actually matched. For example, when the corresponding character is the character is narrow, as it is a single character. When the corresponding character is the set of all alphabetic characters, it is wide. The absolute character count that could be in any region is the main determinant. An obfuscation process in operation 715 can replace characters according to a match span. For matching characters that are unique characters, an obfuscation process cannot change. For matching characters that are in large groups, an obfuscation process replaces the characters with a safe character that is not a member of the set. For example, a set of all letters becomes 0, a set of all numbers becomes X and the mixed alphanumeric content becomes , With an alternative list of characters to use until exhausted. After the text or content has been obfuscated or masked, operation 716 confirms that the text or content has been successfully obfuscated when the new text / content string no longer matches the original regular expression.
[0068] Figure 8 illustrates graph diagram 800 to further illustrate the operation of the elements of figures 1 to 3. Figure 8 focuses on the enhanced limit processes used in annotating sensitive data on user interfaces. The operations in figure 8 can include enhanced hysteresis operations
Petition 870190080864, of 8/20/2019, p. 49/85
34/53 for annotating sensitive data, and various annotation rules or limits can be configured by policy administrators or users, among other entities.
[0069] Figure 8 includes graph 800, which includes a vertical axis indicating a number of sensitive data / content items present in a document and a horizontal axis indicating time. A first 820 limit is established, which can initiate the presentation or removal of sensitive content annotations in a user interface. A second limit 822 can be established, which can also initiate the submission or removal of sensitive content annotations. An elasticity factor 821 and resilience property 823 can be established to modify the behavior of the first and second limits.
[0070] When confidential data has been annotated in a user interface, such as by flags, markings or highlights, a user can edit the confidential content to correct problems with confidential content (such as when selecting one or more obfuscation options). However, after a limited number of confidential content issues have been resolved, there may not be enough instances left to justify annotating the document as being generally contrary to the confidential content rules for the saving disposition or location. Likewise, when new confidential content is introduced in a document, there may be sufficient instances to justify annotating the document to indicate confidential content to a user.
[0071] During the content editing processes by users, enabling and disabling annotation indicators for one or more content elements may be based, at least in part, on a current quantity of the content elements with respect to the annotation rules. Annotation rules can comprise at least the
Petition 870190080864, of 8/20/2019, p. 50/85
35/53 first limit quantity 820, elasticity factor 821 to modify the first limit quantity 820 to a second limit quantity 822 when activated, and an indication of a resilience limit or 'viscosity' property 823 indicating when the second limit quantity 822 overrides the first limit quantity 820. An annotation service, such as annotator 212, can determine or identify annotation rules, such as policy rules 513 and actions 514 discussed in figure 5, which are established for target entities associated with the content edition. Target entities can include users performing content editing, a provision that comprises the user who performs content editing, or a type of user application application, among others. During user editing of a document that contains confidential content or potentially may contain confidential content, annotator 212 monitors user content in an associated user data file, presented for editing content in a user interface for the application. user. Annotator 212 identifies a number of content elements containing confidential content among the user content corresponding to one or more predetermined data schemes discussed herein. Content elements can include cells, objects, formats, words or other structural data or hierarchical data elements.
[0072] During editing and based, at least, on the quantity of content elements exceeding a first limit quantity, annotator 212 starts the presentation of at least one annotation indicator in the user interface that signals the user content in the interface as containing at least first confidential content. In figure 8 (starting with annotations in a disconnected state), the first limit 820 indicates an example amount of '8' at transition point 830 as a triggering presentation.
Petition 870190080864, of 8/20/2019, p. 51/85
36/53 annotation indicators in a user interface. The number of content elements with sensitive content can increase, for example, by user editing and then can decrease after a user sees that sensitive content is present and begins to select obfuscation options to mask that sensitive content.
[0073] Based at least on the quantity of content elements initially exceeding the first limit quantity 820 and subsequently falling below the first limit quantity 820 when the elasticity factor 821 is applied to the first limit quantity 820, annotator 212 establishes the second limit quantity 822 based at least on the elasticity factor. When the second limit quantity 822 is active (that is, when the elasticity factor 821 applies to the first limit quantity 820), then the second limit quantity 822 is used to start removing the display of at least one annotation indicator when the quantity falls below the second limit quantity 822, as seen at transition point 832. However, based at least on the quantity of content elements initially exceeding the first limit quantity 820 and subsequently falling below the first limit quantity 820 when the factor of elasticity is not applied to the first limit quantity 820, at least one annotation indicator is removed, as indicated by transition point 831.
[0074] The elasticity factor 821 can comprise a percentage that varies from 0 to 100% or another metric. In a specific example, an annotation rule can be established that defines the inclusion of more than 100 SSNs in a document that violates corporate policy. When editing a document that exceeds 100 SSNs, then an annotation rule for a first limit quantity may request that all SSNs in the document be highlighted. When one
Petition 870190080864, of 8/20/2019, p. 52/85
37/53 user starts by obfuscating SSNs, the amount of remaining non-obfuscated SSNs is reduced. The elasticity factor can maintain annotation or enhancement of SSNs even if the first limit quantity 820 that triggered the annotation is no longer met, such as when 99 SSNs remain unobfuscated. An elasticity factor of 100 would correspond to an unmodified first limit quantity, and an elasticity of 0 would correspond to the annotations never being removed until all SSNs were obfuscated. An intermediate value of 50 for the elasticity factor would correspond to the removal of the annotations, since the entry 50 is fixed after the annotations were initially triggered to be presented. Thus, in the example in figure 8, the elasticity factor establishes a second limit quantity for removing the notes, once the notes have been presented to a user. In this example, the second limit quantity 822 is at '2' and, therefore, when the remaining confidential content issues fall below the remaining '2', the annotations will be removed, as indicated by the transition point 832.
[0075] If the second limit quantity 822 has fallen below and then additional confidential content issues arise during content editing, then annotator 212 must decide when to alert the user, presenting the notes again. Based at least on the number of content elements initially below the second limit quantity 822 and subsequently exceeding the second limit quantity 822 when the limit resilience property 823 is applied to the second limit quantity 822, the annotator 212 starts the annotation presentation additional in the user interface that flags the user content in the user interface as containing sensitive content, as indicated by transition point 833.
Petition 870190080864, of 8/20/2019, p. 53/85
38/53 [0076] The resilience property 823 comprises a viscosity property for the second limit quantity 822 and is defined by an on / off or Boolean condition. When deactivated, the second limit quantity 822 is not used to resubmit notes, if exceeded. When activated, the second limit quantity 822 is used to resubmit the notes, if exceeded. Therefore, based at least on the quantity of content elements initially below the second limit quantity 822 and subsequently exceeding the second limit quantity 822 when the resilience property is not applied to the second limit quantity 822, the annotator 212 retains the presentation of the annotations which flag user content in the user interface as containing at least confidential content until the number of content elements again exceeds the first 820 limit quantity.
[0077] Now going back to figure 9, the computing system 901 is presented. Computing system 901 that is representative of any system or set of systems in which the various architectures, scenarios and operational processes described herein can be implemented. For example, computing system 901 can be used to implement any user platform 110 or DLP platform 120 of figure 1. Examples of computing system 901 include, but are not limited to, server computers, cloud computing systems, systems distributed computing, software-defined network systems, computers, desktop computers, hybrid computers, rack servers, web servers, cloud computing platforms and data center equipment, as well as any other type of physical or virtual server, and other computing systems and devices, as well as any variation or combination thereof. When portions of the 901 computing system are implemented on user devices,
Petition 870190080864, of 8/20/2019, p. 54/85
39/53 examples of devices include smartphones, laptops, tablet computers, desktop computers, game systems, entertainment systems and the like.
[0078] The computing system 901 can be implemented as a single device, system or device or can be implemented in a distributed manner as multiple devices, systems or devices. The computing system 901 includes, but is not limited to, processing system 902, storage system 903, software 905, communication interface system 907 and user interface system 908. Processing system 902 is operatively coupled with the storage system 903, communication interface system 907 and user interface system 908.
[0079] Processing system 902 loads and runs software 905 from storage system 903. Software 905 includes the DLP 906 application environment and / or the DLP 909 shared environment, which is representative of the processes discussed in relation to previous figures. When run by processing system 902 to process user content for identification, annotation and obfuscation of confidential content, software 905 directs processing system 902 to operate as described here for at least the various processes, operational scenarios and environments discussed in previous implementations. The 901 computing system may optionally include additional devices, features or functionality not discussed for brevity.
[0080] Referring further to figure 9, processing system 902 may comprise a processing circuit and microprocessor that retrieve and run software 905 from storage system 903. Processing system 902 can be implemented within a single device processing, but also
Petition 870190080864, of 8/20/2019, p. 55/85
40/53 can be distributed through multiple processing devices or subsystems that cooperate in executing program instructions. Examples of the 902 processing system include general purpose central processing units, application-specific processors and logic devices, as well as any other type of processing device, combinations or variations thereof.
[0081] The storage system 903 can comprise any storage medium readable by the computer processing system 902 and capable of storing software 905. The storage system 903 can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, such as computer-readable instructions, data structures, program modules or other data. Examples of storage media include random access memory, read-only memory, magnetic disks, resistive memory, optical disks, flash memory, virtual memory and non-virtual memory, magnetic tapes, magnetic tape, magnetic disk storage or other storage devices magnetic device, or any other suitable storage medium. In no case is the computer-readable storage medium a propagated signal.
[0082] In addition to a computer-readable storage medium, in some implementations the 903 storage system may also include a computer-readable communication medium over which at least some of the 905 software can be communicated internally or externally. The storage system 903 can be implemented as a single storage device, but it can also be implemented through multiple storage devices or subsystems co-located or distributed in relation to each other. The 903 storage system can
Petition 870190080864, of 8/20/2019, p. 56/85
41/53 comprise additional elements, such as a controller, capable of communicating with the processing system 902 or possibly other systems.
[0083] Software 905 can be implemented in program instructions and, among other functions, when executed by processing system 902, directs processing system 902 to operate as described in relation to the various operational scenarios, sequences and processes illustrated here. For example, the 905 software may include program instructions for implementing the dataset processing environments and platforms discussed here.
[0084] In particular, the program instructions may include various components or modules that cooperate or otherwise interact to carry out the various processes and operational scenarios described here. The various components or modules can be incorporated into compiled or interpreted instructions, or in some other variation or combination of instructions. The various components or modules can be executed synchronously or asynchronously, in series or in parallel, in a single threaded or multi-threaded environment, or according to any other suitable execution paradigm, variation or combination of them. The 905 Software may include additional processes, programs or components, such as operating system software or other application software, in addition to or which include the DLP 906 application environment or DLP 909 shared environment. The 905 software may also include firmware or some another form of machine-readable processing instructions executable by the 902 processing system.
[0085] In general, software 905 can, when loaded into processing system 902 and executed, transform a suitable device, system or device (of which the computing system
Petition 870190080864, of 8/20/2019, p. 57/85
42/53
901 is representative) in general from a general purpose computing system to a customized special purpose computing system to facilitate improved processing of user content for identification, annotation and obfuscation of confidential content. In fact, the encoding software 905 in the storage system 903 can transform the physical structure of the storage system 903. The specific transformation of the physical structure can depend on several factors in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the storage medium of the 903 storage system and whether the computer storage medium is characterized as primary or secondary storage, as well as other factors.
[0086] For example, if the computer-readable storage medium is implemented as semiconductor-based memory, the 905 software can transform the physical state of the semiconductor memory when the program instructions are encoded, such as transforming the state of transistors , capacitors, or other discrete circuit elements that make up semiconductor memory. A similar transformation can occur with respect to the magnetic or optical medium. Other transformations of physical means are possible without departing from the scope of the present description, with the previous examples provided only to facilitate the present discussion.
[0087] The DLP 906 application environment or DLP 909 shared environment each includes one or more software elements, such as OS 921/931 and 922/932 applications. These elements can describe various portions of the 901 computing system with which users, data sources, data services or other elements interact. For example, OS 921/931 can provide a platform
Petition 870190080864, of 8/20/2019, p. 58/85
43/53 form of software in which the 922/932 application is executed and allows the processing of user content for identification, annotation and obfuscation of confidential content, among other functions.
[0088] In one example, the DLP 932 service includes content divider 924, annotator 925, mapper 926 and obfuscator 927. Content divider 924 flattens structured or hierarchical user content elements into linear pieces for processing by a rating service. Annotator 925 graphically highlights sensitive data or content in a user interface so that users can be alerted to the presence of a limited amount of sensitive data. Mapper 926 can derive specific locations between documents for confidential data annotations, such as when only offsets / lengths / IDs are provided by a classification service to locate sensitive data in various structural or hierarchical elements of the document. The 927 obfuscator offers obfuscation options for masking / replacing user content that has been identified as confidential data. Obfuscator 927 also replaces sensitive content that responds to user selections from the obfuscation options.
[0089] In another example, the DLP 933 service includes the classification service 934, the crawler 935, the policy rules / module 936 and the regex service 937. The classification service 934 analyzes linear pieces of data or content to identify data confidential. Tracker 935 maintains counts or quantities of sensitive data items found by the classification service 934, and indicates the offsets of confidential data and lengths for a mapper for annotating a document (such as mapper 926 and annotator 925). The 936 policy / rules module can receive and maintain various policies and rules for annotating, sorting, detecting,
Petition 870190080864, of 8/20/2019, p. 59/85
44/53 obfuscation or other operations on user content. The regex 937 service comprises an example of a classification technique that uses regular expression matching to identify sensitive data using data patterns or data schemas and to replace the text of the corresponding content with obfuscated content. [0090] The communication interface system 907 may include communication connections and devices that allow communication with other computing systems (not shown) through communication networks (not shown). Examples of connections and devices that together allow communication between systems can include network interface cards, antennas, power amplifiers, RF circuits, transceivers and other communication circuits. Connections and devices can communicate through communication means to exchange communications with other computer systems or systems networks, such as metal, glass, air or any other suitable means of communication. Physical or logical elements of the 907 communication interface system can receive data sets from telemetry sources, transfer data sets and control information between one or more distributed data storage elements and interact with a user to receive data selections and provide visualized data sets, among other resources.
[0091] User interface system 908 is optional and can include a keyboard, a mouse, a voice input device, a touch input device to receive input from a user. Output devices, such as a monitor, speakers, web interfaces, terminal interfaces and other types of output devices can also be included in the 908 user interface system. The 908 user interface system can provide output and receive input via a network interface, such as a community interface system
Petition 870190080864, of 8/20/2019, p. 60/85
45/53 communication 907. In network examples, the 908 user interface system can package display data or graphics for remote display by a display system or computer system coupled through one or more network interfaces. Physical or logical elements of the 908 user interface system can receive classification rules or policies from users or policy personnel, receive user data editing activities, present confidential content annotations to users, provide obfuscation options to users and present obscured user content to users, among other operations. The 908 user interface system may also include associated user interface software, executable by the 902 processing system, in support of the various user input and output devices discussed above. Separately or in conjunction with each other and other hardware and software elements, user interface software and user interface devices can support a graphical user interface, a natural user interface or any other type of user interface .
[0092] The communication between the computing system 901 and other computing systems (not shown), can occur through a communication network or networks and according to various communication protocols, combinations of protocols, or variations thereof. Examples include intranets, internets, the Internet, local area networks, wide area networks, wireless networks, wired networks, virtual networks, software defined networks, data center buses, computing backplanes, or any other type of network, combination of network or variation thereof. The communication networks and protocols mentioned above are well known and need not be discussed in detail here. However, some communication protocols that can be used include, among others, the Inter protocol
Petition 870190080864, of 8/20/2019, p. 61/85
46/53 net (IP, IPv4, IPv6, etc.), the transmission control protocol (TCP) and the user datagram protocol (HDP), as well as any other suitable communication protocol, variation or combination thereof.
[0093] Certain inventive aspects can be appreciated from the previous description, of which the following are several examples.
[0094] Example 1: A method for providing a data obfuscation framework for a user application, the method comprising providing user content to a rating service configured to process user content to classify portions of user content as confidential content corresponding to one or more predetermined data schemes, and receiving from the classification service indications of one or more portions of the user content that contain the confidential content. The method includes displaying graphical indications in a user interface for the user application that annotates one or more portions of the user content as containing the confidential content and presents obfuscation options in the user interface to mask the confidential content in at least a portion selected from one or more portions of user content. In response to a user selection of at least one of the obfuscation options, the method includes replacing the associated user content with obfuscated content that maintains a data schema of the associated user content.
[0095] Example 2: The method of Example 1, further comprising presenting the obfuscation options as comprising a first option for masking confidential content within the selected portion and a second option for masking confidential content within the selected portion and additional portions of the user content comprising additional confidential content
Petition 870190080864, of 8/20/2019, p. 62/85
47/53 having a data scheme similar to that of the selected portion.
[0096] Example 3: The method of Examples 1 to 2, further comprising presenting the obfuscation options as indicating at least one example of the obfuscated version of the target user content within the selected portion.
[0097] Example 4: The method of Examples 1 to 3, where the graphic indications that annotate one or more portions of the user content comprise indicators positioned next to one or more portions that are selectable in the user interface to present the obfuscation options .
[0098] Example 5: The method of Examples 1 to 4, wherein the obfuscated content that maintains the data schema of the associated user content comprises symbols selected based in part on the data schema of the associated user content to prevent identification associated user content, while maintaining the data schema of the associated user content.
[0099] Example 6: The method of Examples 1 to 5, further comprising the response to the replacement of associated user content with obfuscated content, providing the obfuscated content to the rating service to confirm that the obfuscated content does not contain other confidential content.
[00100] Example 7: The method of Examples 1 to 6, in which one or more predetermined data schemes are defined by one or more regular expressions used to analyze user content to identify portions as being indicative of one or more patterns predetermined content or one or more predetermined content types.
[00101] Example 8: The method of Examples 1 to 7, where the one or more predetermined data schemes comprise each of the first portions to be obfuscated and the second portions
Petition 870190080864, of 8/20/2019, p. 63/85
48/53 a remain unobfuscated, the first portions to be obfuscated corresponding to locations with more than one permitted character, and the second portions to remain obfuscated having only one permitted character including a delimiting character. The method further comprises identifying whether a part of the first portions is designed to remain discernible as to the uniqueness after obfuscation and designating the part to remain unobfuscated.
[00102] Example 9: A data obfuscation framework for a user application, comprising one or more computer-readable storage media, a processing system operationally coupled to one or more computer-readable storage media and stored program instructions on one or more computer-readable storage media. Based, at least, on being read and executed by the processing system, the program instructions direct the processing system to at least provide user content to a rating service configured to process user content to classify portions of content from user as confidential content corresponding to one or more predetermined data schemes, and receive from the classification service indications of one or more portions of the user content that contain the confidential content. Based, at least, on being read and executed by the processing system, the program instructions further direct the processing system to at least present graphical indications in a user interface to the user application that notes the one or more more portions of user content as containing sensitive content, present obfuscation options in the user interface to mask sensitive content in at least a selected portion of one or more portions of the
Petition 870190080864, of 8/20/2019, p. 64/85
49/53 user content and respond to a selection of at least one of the obfuscation options, replace the user content associated with obfuscated content that maintains a data schema of the associated user content.
[00103] Example 10: The data obfuscation framework of Example 9, further comprising program instructions, based at least on being read and executed by the processing system, directs the processing system to at least present the obfuscation options as comprising a first option to mask confidential content within the selected portion and a second option to mask confidential content within the selected portion and additional portions of user content comprising additional confidential content having a data scheme similar to the selected portion.
[00104] Example 11: The data obfuscation framework of Examples 9 to 10, comprising additional program instructions, based at least on being read and executed by the processing system, directs the processing system to at least present the obfuscation options as indicating at least one example, the obfuscated version of the target user content within the selected portion.
[00105] Example 12: The data obfuscation framework of Examples 9 to 11, where the graphic indications that annotate one or more portions of the user content comprise indicators positioned in the proximity of one or more portions that are selectable in the user interface for present the obfuscation options.
[00106] Example 13: The data obfuscation framework of Examples 9 to 12, in which the obfuscated content that maintains the data scheme of the associated user content comprises symbols selected based in part on the content data scheme
Petition 870190080864, of 8/20/2019, p. 65/85
50/53 associated user to prevent the identification of associated user content, maintaining the data schema of the associated user content.
[00107] Example 14: The data obfuscation framework of Examples 9 to 13, comprising additional program instructions, based at least on being read and executed by the processing system, directs the processing system to at least in response to the replacement of the user content associated with the obfuscated content, provide the obfuscated content to the rating service to confirm that the obfuscated content no longer contains confidential content.
[00108] Example 15: The data obfuscation framework of Examples 9 to 14, in which one or more predetermined data schemes are defined by one or more regular expressions used to analyze user content to identify portions as being indicative of one or more predetermined content standards or one or more predetermined content types.
[00109] Example 16: The data obfuscation framework of Examples 9 to 15, wherein the one or more predetermined data schemes comprise each of the first portions to be obfuscated and the second portions to remain unobfuscated, the first obfuscated portions corresponding to locations with more than one permitted character and the second portions to remain unobfuscated having only one permitted character comprising a delimiting character. The data obfuscation framework comprising additional program instructions, based at least on being read and executed by the processing system, directs the processing system to at least identify whether a portion of the first portions is designed to remain discernible as to the uniqueness after obfuscation. , and designate the party to
Petition 870190080864, of 8/20/2019, p. 66/85
51/53 remain undefined.
[00110] Example 17: A method of operating a user application, the method comprising providing user content from a user data file to a rating service configured to process user content to classify one or more portions of the user content as confidential content corresponding to one or more data schemes and display indicators in a user interface that flags one or more portions of user content as containing confidential content, where the indicators are positioned next to one or more portions and can be selected in the user interface to present obfuscation options. In response to the selection of a first indicator, the method includes presenting the first obfuscation options in the user interface to replace the associated confidential content in a first portion of the flagged user content with the first of the indicators. In response to a selection of at least one of the first obfuscation options, the method includes replacing the associated confidential content with obfuscated content that maintains a data schema of the associated confidential content.
[00111] Example 18: The method of Example 17, further comprising presenting the first obfuscation options as comprising a first option to replace the confidential content associated with the obfuscated content and a second option to replace the associated confidential content and the confidential content additional user data file with a data schema similar to the associated confidential content.
[00112] Example 19: The method of Examples 17 to 18, in which the obfuscated content that maintains the data scheme of the associated confidential content comprises one or more symbols
Petition 870190080864, of 8/20/2019, p. 67/85
52/53 dos to prevent the identification of the associated confidential content, while maintaining the data scheme of the associated user content, in which one or more symbols are selected based in part on the data scheme of the associated confidential content.
[00113] Example 20: The method of Examples 17 to 19, in which the one or more data schemes each comprise the first portions to be obfuscated and the second portions to remain unobfuscated, the first portions to be obfuscated corresponding to locations with more than one character allowed, and the second portions to remain unobfuscated having only one permitted character comprising a delimiting character. The method further comprises identifying whether a part of the first portions is designed to remain discernible as to the uniqueness after obfuscation and designating the part to remain unobfuscated.
[00114] The functional block diagrams, scenarios and operational sequences and flowcharts provided in the figures are representative of exemplary systems, environments and methodologies for carrying out new aspects of the invention. Although, for the sake of simplicity of explanation, the methods included in this document may be in the form of a functional diagram, operational scenario or sequence, or flow chart, and can be described as a series of acts, it is to be understood and appreciated that the methods they are not limited by the order of acts, as some acts may, accordingly, occur in a different order and / or concurrently with other acts than those shown and described here. For example, those skilled in the art will understand and appreciate that a method could alternatively be represented as a series of interrelated states or events, such as in a state diagram. In addition, not all the actions illustrated in a methodology may be necessary for a new implementation.
Petition 870190080864, of 8/20/2019, p. 68/85
53/53 [00115] The descriptions and figures included here describe specific implementations to teach those skilled in the art to make and use the best option. For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. Those skilled in the art will appreciate variations of these implementations that fall within the scope of the invention. Those skilled in the art will also appreciate that the features described above can be combined in various ways to form multiple implementations. As a result, the invention is not limited to the specific implementations described above, but only by the claims and their equivalents.
权利要求:
Claims (15)
[1]
1. Method for providing a data obfuscation framework for a user application, characterized by the fact that the method comprises:
providing user content to a rating service configured to process user content to classify portions of user content as comprising confidential content corresponding to one or more predetermined data schemes;
receive from the classification service indications of one or more portions of the user content that contain the confidential content;
display graphical indications in a user interface for the user application that annotates one or more portions of the user content as containing the confidential content;
present obfuscation options in the user interface to mask confidential content within at least a selected portion from one or more portions of the user content; and in response to a user selection of at least one of the obfuscation options, replace the associated user content with obfuscated content that maintains a data schema of the associated user content.
[2]
2. Method, according to claim 1, characterized by the fact that it additionally comprises:
present the obfuscation options as comprising a first option for masking confidential content within the selected portion and a second option for masking confidential content within the selected portion and additional portions of user content further comprising confidential content
Petition 870190080864, of 8/20/2019, p. 70/85
2/6 dential having a data scheme similar to the selected portion.
[3]
3. Method, according to claim 1, characterized by the fact that it additionally comprises:
present the obfuscation options as indicating at least one example of an obfuscated version of the target user content within the selected portion.
[4]
4. Method, according to claim 1, characterized by the fact that the graphic indications that annotate one or more portions of the user content comprise indicators positioned next to one or more portions that are selectable in the user interface to present the options of obfuscation.
[5]
5. Method according to claim 1, characterized by the fact that the obfuscated content that maintains the data scheme of the associated user content comprises symbols selected based in part on the data scheme of the associated user content to prevent identification associated user content while maintaining the data schema of the associated user content.
[6]
6. Method, according to claim 1, characterized by the fact that it additionally comprises:
in response to the replacement of user content associated with the obfuscated content, provide the obfuscated content to the rating service to confirm that the obfuscated content no longer contains confidential content.
[7]
7. Method, according to claim 1, characterized by the fact that the one or more predetermined data schemes are defined by one or more regular expressions used to analyze the user content to identify the portions as being indicative of one or more predetermined content standards or one or more types of predetermined content.
Petition 870190080864, of 8/20/2019, p. 71/85
3/6
[8]
8. Method, according to claim 1, characterized by the fact that the one or more predetermined data schemes comprise each of the first portions to be obfuscated and the second portions to remain unobfuscated, the first portions to be obfuscated corresponding to locations having more than one permitted character, and the second portions to remain unobfuscated having only one permitted character comprising a delimiting character; and further comprising:
identify whether a part of the first portions is designed to remain discernible as to its uniqueness after glare, and designate the part to remain unimpeded.
[9]
9. Data obfuscation framework for a user application, characterized by the fact that it comprises:
one or more computer-readable storage media;
a processing system operationally coupled with one or more computer-readable storage media; and program instructions stored on one or more computer-readable storage media that, based on, at least, are read and executed by the processing system, direct the processing system to at least:
providing user content to a rating service configured to process user content to classify portions of user content as comprising confidential content corresponding to one or more predetermined data schemes;
receive from the classification service indications of one or more portions of the user content that contain the confidential content;
display graphical indications in a user interface
Petition 870190080864, of 8/20/2019, p. 72/85
4/6 for the user application that annotates one or more portions of the user content as containing the confidential content;
present obfuscation options in the user interface to mask sensitive content in at least a portion selected from one or more portions of user content; and in response to a user selection of at least one of the obfuscation options, replace the associated user content with obfuscated content that maintains a data schema of the associated user content.
[10]
10. Data obfuscation framework, according to claim 9, characterized by the fact that it comprises additional program instructions, based at least on being read and executed by the processing system, to direct the processing system to at least:
present the obfuscation options as comprising a first option to mask confidential content within the selected portion and a second option to mask confidential content within the selected portion and additional portions of user content comprising additional confidential content having a data scheme similar to selected portion; and present the obfuscation options as indicating at least one example of an obfuscated version of the target user content in the selected portion.
[11]
11. Data obfuscation framework, according to claim 9, characterized by the fact that the graphic indications that annotate one or more portions of the user's content comprise indicators positioned next to one or more portions that are selectable in the user interface to present the obfuscation options.
[12]
12. Data obfuscation framework, according to
Petition 870190080864, of 8/20/2019, p. 73/85
5/6 claim 9, characterized by the fact that the obfuscated content that maintains the data scheme of the associated user content comprises symbols selected based in part on the data scheme of the associated user content to prevent the identification of the associated user content , maintaining the data schema of the associated user content.
[13]
13. Data obfuscation framework, according to claim 9, characterized by the fact that it comprises additional program instructions, based at least on being read and executed by the processing system, to direct the processing system to at least:
in response to replacing associated user content with obfuscated content, provide the obfuscated content to the rating service to confirm that the obfuscated content no longer contains confidential content.
[14]
14. Data obfuscation framework, according to claim 9, characterized by the fact that the one or more predetermined data schemes are defined by one or more regular expressions used to analyze user content to identify portions as being indicative one or more predetermined content standards or one or more types of predetermined content.
[15]
15. Data obfuscation framework, according to claim 9, characterized by the fact that the one or more predetermined data schemes comprise each of the first portions to be obfuscated and the second portions to remain unobfuscated, the first portions to be obfuscated corresponding to locations with more than one permitted character, and the second portions to remain obfuscated having only one permitted character comprising a delimiting character; and
Petition 870190080864, of 8/20/2019, p. 74/85
6/6 comprising additional program instructions, based at least on the reading and execution by the processing system, to direct the processing system to at least:
identify whether a portion of the first portions are designed to remain discernible as to exclusivity after glare and designate the piece to remain glare-free.
类似技术:
公开号 | 公开日 | 专利标题
BR112019017319A2|2020-03-31|DAMAGE OF USER CONTENT IN STRUCTURED USER DATA FILES
US10410014B2|2019-09-10|Configurable annotations for privacy-sensitive user content
US10671753B2|2020-06-02|Sensitive data loss protection for structured user content viewed in user applications
US10223548B2|2019-03-05|Scrubber to remove personally identifiable information
US8949371B1|2015-02-03|Time and space efficient method and system for detecting structured data in free text
US8225371B2|2012-07-17|Method and apparatus for creating an information security policy based on a pre-configured template
US9436463B2|2016-09-06|System and method for checking open source usage
TWI616762B|2018-03-01|Dynamic data masking method and data library system
US20130312105A1|2013-11-21|Classification of an electronic document
US8365247B1|2013-01-29|Identifying whether electronic data under test includes particular information from a database
US9971809B1|2018-05-15|Systems and methods for searching unstructured documents for structured data
JP7012742B2|2022-01-28|Configurable annotations for privacy-sensitive user content
CN109213850A|2019-01-15|The system and method for determining the text comprising confidential data
US20150088933A1|2015-03-26|Controlling disclosure of structured data
US20200320162A1|2020-10-08|Management of content objects for ingestion by multiple entities
Shi et al.2015|Applicability of probablistic data structures for filtering tasks in data loss prevention systems
CN114117188A|2022-03-01|Search statement analysis method and device based on binary tree and electronic equipment
同族专利:
公开号 | 公开日
CL2019002635A1|2020-01-31|
CO2019009852A2|2019-09-30|
IL268795A|2022-03-01|
SG11201908283TA|2019-10-30|
KR20190129877A|2019-11-20|
AU2018239927B2|2022-01-13|
JP2020516127A|2020-05-28|
US20190332784A1|2019-10-31|
US11182490B2|2021-11-23|
EP3602381A1|2020-02-05|
RU2019133475A3|2021-07-30|
RU2019133475A|2021-04-23|
US20180276393A1|2018-09-27|
US10380355B2|2019-08-13|
ZA201905230B|2020-10-28|
AU2018239927A1|2019-08-22|
CN110447035A|2019-11-12|
MX2019011181A|2019-10-30|
IL268795D0|2019-10-31|
PH12019550176A1|2020-06-29|
WO2018175212A1|2018-09-27|
CA3053651A1|2018-09-27|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6424980B1|1998-06-10|2002-07-23|Nippon Telegraph And Telephone Corporation|Integrated retrieval scheme for retrieving semi-structured documents|
US7127615B2|2000-09-20|2006-10-24|Blue Spike, Inc.|Security based on subliminal and supraliminal channels for data objects|
US7352868B2|2001-10-09|2008-04-01|Philip Hawkes|Method and apparatus for security in a data processing system|
AU2003239490A1|2002-05-14|2003-12-02|Verity, Inc.|Searching structured, semi-structured, and unstructured content|
US7886359B2|2002-09-18|2011-02-08|Symantec Corporation|Method and apparatus to report policy violations in messages|
US20040193910A1|2003-03-28|2004-09-30|Samsung Electronics Co., Ltd.|Security filter for preventing the display of sensitive information on a video display|
WO2005010727A2|2003-07-23|2005-02-03|Praedea Solutions, Inc.|Extracting data from semi-structured text documents|
US20050038788A1|2003-08-14|2005-02-17|International Business Machines Corporation|Annotation security to prevent the divulgence of sensitive information|
EP1521161A3|2003-09-25|2006-03-15|Matsushita Electric Industrial Co., Ltd.|An apparatus and a method for preventing unauthorized use and a device with a function of preventing unauthorized use|
US8261058B2|2005-03-16|2012-09-04|Dt Labs, Llc|System, method and apparatus for electronically protecting data and digital content|
WO2007075573A2|2005-12-16|2007-07-05|The 41St Parameter, Inc.|Methods and apparatus for securely displaying digital images|
EP2033128A4|2006-05-31|2012-08-15|Ibm|Method and system for transformation of logical data objects for storage|
US7724918B2|2006-11-22|2010-05-25|International Business Machines Corporation|Data obfuscation of text data using entity detection and replacement|
US8635691B2|2007-03-02|2014-01-21|403 Labs, Llc|Sensitive data scanner|
US8504553B2|2007-04-19|2013-08-06|Barnesandnoble.Com Llc|Unstructured and semistructured document processing and searching|
US8627403B1|2007-07-31|2014-01-07|Hewlett-Packard Development Company, L.P.|Policy applicability determination|
US20090100527A1|2007-10-10|2009-04-16|Adrian Michael Booth|Real-time enterprise data masking|
US20090132419A1|2007-11-15|2009-05-21|Garland Grammer|Obfuscating sensitive data while preserving data usability|
US7877398B2|2007-11-19|2011-01-25|International Business Machines Corporation|Masking related sensitive data in groups|
US8347396B2|2007-11-30|2013-01-01|International Business Machines Corporation|Protect sensitive content for human-only consumption|
US8280905B2|2007-12-21|2012-10-02|Georgetown University|Automated forensic document signatures|
US8145632B2|2008-02-22|2012-03-27|Tigerlogic Corporation|Systems and methods of identifying chunks within multiple documents|
US7996373B1|2008-03-28|2011-08-09|Symantec Corporation|Method and apparatus for detecting policy violations in a data repository having an arbitrary data schema|
US20090259670A1|2008-04-14|2009-10-15|Inmon William H|Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source|
US8041695B2|2008-04-18|2011-10-18|The Boeing Company|Automatically extracting data from semi-structured documents|
US8346532B2|2008-07-11|2013-01-01|International Business Machines Corporation|Managing the creation, detection, and maintenance of sensitive information|
US8069053B2|2008-08-13|2011-11-29|Hartford Fire Insurance Company|Systems and methods for de-identification of personal data|
US8200509B2|2008-09-10|2012-06-12|Expanse Networks, Inc.|Masked data record access|
US20100088296A1|2008-10-03|2010-04-08|Netapp, Inc.|System and method for organizing data to facilitate data deduplication|
US8533844B2|2008-10-21|2013-09-10|Lookout, Inc.|System and method for security data collection and analysis|
US8156159B2|2009-02-11|2012-04-10|Verizon Patent And Licensing, Inc.|Data masking and unmasking of sensitive data|
US8863304B1|2009-03-26|2014-10-14|Symantec Corporation|Method and apparatus for remediating backup data to control access to sensitive data|
BR112012005727A2|2009-09-14|2019-09-24|Directv Group Inc|method and system for distributing content.|
US20110219446A1|2010-03-05|2011-09-08|Jeffrey Ichnowski|Input parameter filtering for web application security|
AU2011201369A1|2010-03-25|2011-10-13|Rl Solutions|Systems and methods for redacting sensitive data entries|
US8949184B2|2010-04-26|2015-02-03|Microsoft Technology Licensing, Llc|Data collector|
SG177018A1|2010-06-09|2012-01-30|Smart Communications Inc|System and method for the provision of content to a subscriber|
US8539560B2|2010-06-24|2013-09-17|International Business Machines Corporation|Content protection using automatically selectable display surfaces|
US9298878B2|2010-07-29|2016-03-29|Oracle International Corporation|System and method for real-time transactional data obfuscation|
US8892550B2|2010-09-24|2014-11-18|International Business Machines Corporation|Source expansion for information retrieval and information extraction|
JP5827467B2|2010-11-12|2015-12-02|インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation|Method, apparatus, server, and computer program for masking partial text data in electronic document|
US8601594B2|2010-11-30|2013-12-03|International Business Machines Corporation|Automatically classifying an input from field with respect to sensitivity of information it is designed to hold|
US9003542B1|2010-12-13|2015-04-07|Symantec Corporation|Systems and methods for replacing sensitive information stored within non-secure environments with secure references to the same|
US8862522B1|2010-12-14|2014-10-14|Symantec Corporation|Incremental machine learning for data loss prevention|
US8769200B2|2011-03-01|2014-07-01|Xbridge Systems, Inc.|Method for managing hierarchical storage during detection of sensitive information, computer readable storage media and system utilizing same|
US10534931B2|2011-03-17|2020-01-14|Attachmate Corporation|Systems, devices and methods for automatic detection and masking of private data|
EP2689353B1|2011-03-22|2019-11-06|Informatica LLC|System and method for data masking|
US8930381B2|2011-04-07|2015-01-06|Infosys Limited|Methods and systems for runtime data anonymization|
WO2012144970A1|2011-04-19|2012-10-26|Hewlett-Packard Development Company, L.P.|Obstructing user content based on location|
US8688601B2|2011-05-23|2014-04-01|Symantec Corporation|Systems and methods for generating machine learning-based classifiers for detecting specific categories of sensitive information|
US8806204B2|2011-06-20|2014-08-12|Liaison Technologies, Inc.|Systems and methods for maintaining data security across multiple active domains|
US9104528B2|2011-12-08|2015-08-11|Microsoft Technology Licensing, Llc|Controlling the release of private information using static flow analysis|
US9183212B2|2012-01-26|2015-11-10|Upthere, Inc.|Representing directory structure in content-addressable storage systems|
US8898796B2|2012-02-14|2014-11-25|International Business Machines Corporation|Managing network data|
US8959047B2|2012-05-10|2015-02-17|Check Point Software Technologies Ltd.|Reducing false positives in data validation using statistical heuristics|
US9473532B2|2012-07-19|2016-10-18|Box, Inc.|Data loss prevention methods by a cloud service including third party integration architectures|
IN2015DN01833A|2012-09-07|2015-05-29|Tiversa Ip Inc|
US9489376B2|2013-01-02|2016-11-08|International Business Machines Corporation|Identifying confidential data in a data item by comparing the data item to similar data items from alternative sources|
US8973149B2|2013-01-14|2015-03-03|Lookout, Inc.|Detection of and privacy preserving response to observation of display screen|
US8925099B1|2013-03-14|2014-12-30|Reputation.Com, Inc.|Privacy scoring|
CN104166822B|2013-05-20|2017-10-13|阿里巴巴集团控股有限公司|A kind of method and apparatus of data protection|
US20150040237A1|2013-08-05|2015-02-05|Xerox Corporation|Systems and methods for interactive creation of privacy safe documents|
US9392012B2|2013-11-01|2016-07-12|Bank Of America Corporation|Application security testing system|
US9177174B1|2014-02-06|2015-11-03|Google Inc.|Systems and methods for protecting sensitive data in communications|
US9256727B1|2014-02-20|2016-02-09|Symantec Corporation|Systems and methods for detecting data leaks|
US9542622B2|2014-03-08|2017-01-10|Microsoft Technology Licensing, Llc|Framework for data extraction by examples|
US9330273B2|2014-03-19|2016-05-03|Symantec Corporation|Systems and methods for increasing compliance with data loss prevention policies|
US9785795B2|2014-05-10|2017-10-10|Informatica, LLC|Identifying and securing sensitive data at its source|
US9858440B1|2014-05-23|2018-01-02|Shape Security, Inc.|Encoding of sensitive data|
US10129370B2|2014-08-01|2018-11-13|Protegrity Corporation|Mapping between user interface fields and protocol information|
US9384357B2|2014-10-01|2016-07-05|Quixey, Inc.|Providing application privacy information|
EP3210140A4|2014-10-20|2018-06-06|3M Innovative Properties Company|Identification of codable sections in medical documents|
US9898610B1|2014-10-22|2018-02-20|State Farm Mutual Automobile Insurance Company|System and method for concealing sensitive data on a computing device|
US9697349B2|2014-10-26|2017-07-04|Microsoft Technology Licensing, Llc|Access blocking for data loss prevention in collaborative environments|
US9934406B2|2015-01-08|2018-04-03|Microsoft Technology Licensing, Llc|Protecting private information in input understanding system|
US9454675B2|2015-01-26|2016-09-27|Idis Co., Ltd.|Apparatus and method for protecting personal information of recorded image, and computer-readable recording medium having computer program recorded therein|
US10140343B2|2015-02-09|2018-11-27|Ca, Inc.|System and method of reducing data in a storage system|
US10614113B2|2015-04-16|2020-04-07|Docauthority Ltd.|Structural document classification|
EP3166041A1|2015-11-07|2017-05-10|Tata Consultancy Services Limited|Format preserving masking system and method|
US10282557B1|2015-11-19|2019-05-07|Veritas Technologies Llc|Systems and methods for protecting sensitive data against data loss|
US9904957B2|2016-01-15|2018-02-27|FinLocker LLC|Systems and/or methods for maintaining control over, and access to, sensitive data inclusive digital vaults and hierarchically-arranged information elements thereof|
CN109074496A|2016-06-28|2018-12-21|惠普发展公司,有限责任合伙企业|Hide sensitive data|
US10430610B2|2016-06-30|2019-10-01|International Business Machines Corporation|Adaptive data obfuscation|
US10387670B2|2016-09-21|2019-08-20|International Business Machines Corporation|Handling sensitive data in an application using external processing|
US20180253219A1|2017-03-06|2018-09-06|Microsoft Technology Licensing, Llc|Personalized presentation of content on a computing device|
US10380355B2|2017-03-23|2019-08-13|Microsoft Technology Licensing, Llc|Obfuscation of user content in structured user data files|
US10671753B2|2017-03-23|2020-06-02|Microsoft Technology Licensing, Llc|Sensitive data loss protection for structured user content viewed in user applications|
US10410014B2|2017-03-23|2019-09-10|Microsoft Technology Licensing, Llc|Configurable annotations for privacy-sensitive user content|
US10200331B2|2017-06-28|2019-02-05|Xerox Corporation|Methods and systems for performing structure-preserving obfuscation on emails|US10541982B1|2016-06-02|2020-01-21|Jpmorgan Chase Bank, N.A.|Techniques for protecting electronic data|
US10671753B2|2017-03-23|2020-06-02|Microsoft Technology Licensing, Llc|Sensitive data loss protection for structured user content viewed in user applications|
US10380355B2|2017-03-23|2019-08-13|Microsoft Technology Licensing, Llc|Obfuscation of user content in structured user data files|
US10410014B2|2017-03-23|2019-09-10|Microsoft Technology Licensing, Llc|Configurable annotations for privacy-sensitive user content|
US10460115B2|2017-05-15|2019-10-29|International Business Machines Corporation|Data anonymity|
US10200331B2|2017-06-28|2019-02-05|Xerox Corporation|Methods and systems for performing structure-preserving obfuscation on emails|
US11269934B2|2018-06-13|2022-03-08|Oracle International Corporation|Regular expression generation using combinatoric longest common subsequence algorithms|
US10817617B1|2018-06-28|2020-10-27|Ca, Inc.|Data loss prevention for biometric data|
US11201889B2|2019-03-29|2021-12-14|Citrix Systems, Inc.|Security device selection based on secure content detection|
US11100087B2|2019-04-26|2021-08-24|Microsoft Technology Licensing, Llc|Data tokenization system maintaining data integrity|
KR102196547B1|2019-05-20|2020-12-29|주식회사 무하유|Method and apparatus for blind processing of specific information in document|
CN111177667B|2019-12-16|2021-08-10|浙江信网真科技股份有限公司|Authority control method and system for content partition processing|
US20210406266A1|2020-06-30|2021-12-30|Microsoft Technology Licensing, Llc|Computerized information extraction from tables|
WO2022041058A1|2020-08-27|2022-03-03|Citrix Systems, Inc.|Privacy protection during video conferencing screen share|
WO2022041163A1|2020-08-29|2022-03-03|Citrix Systems, Inc.|Identity leak prevention|
KR102263111B1|2021-01-15|2021-06-09| 투씨에스지|Method for data security management and recording medium recording program for performing the method|
CN112818390A|2021-01-26|2021-05-18|支付宝信息技术有限公司|Data information publishing method, device and equipment based on privacy protection|
CN113971296A|2021-12-24|2022-01-25|每日互动股份有限公司|ID fuzzification data processing system|
法律状态:
2021-10-13| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US15/467,029|US10380355B2|2017-03-23|2017-03-23|Obfuscation of user content in structured user data files|
US15/467,029|2017-03-23|
PCT/US2018/022767|WO2018175212A1|2017-03-23|2018-03-16|Obfuscation of user content in structured user data files|
[返回顶部]